Search CORE

14 research outputs found

The Maximum Exposure Problem

Author: Kumar Neeraj
Sintos Stavros
Suri Subhash
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. Approximation, Randomization, and Combinatorial Optimization. Algorithms and Techniques (APPROX/RANDOM 2019)
Publication date: 01/01/2019
Field of study

Given a set of points P and axis-aligned rectangles R in the plane, a point p in P is called exposed if it lies outside all rectangles in R. In the max-exposure problem, given an integer parameter k, we want to delete k rectangles from R so as to maximize the number of exposed points. We show that the problem is NP-hard and assuming plausible complexity conjectures is also hard to approximate even when rectangles in R are translates of two fixed rectangles. However, if R only consists of translates of a single rectangle, we present a polynomial-time approximation scheme. For general rectangle range space, we present a simple O(k) bicriteria approximation algorithm; that is by deleting O(k^2) rectangles, we can expose at least Omega(1/k) of the optimal number of points

Dagstuhl Research Online Publication Server

JanusAQP: Efficient Partition Tree Maintenance for Dynamic Approximate Query Processing

Author: Krishnan Sanjay
Liang Xi
Sintos Stavros
Publication venue
Publication date: 20/04/2022
Field of study

Approximate query processing over dynamic databases, i.e., under insertions/deletions, has applications ranging from high-frequency trading to internet-of-things analytics. We present JanusAQP, a new dynamic AQP system, which supports SUM, COUNT, AVG, MIN, and MAX queries under insertions and deletions to the dataset. JanusAQP extends static partition tree synopses, which are hierarchical aggregations of datasets, into the dynamic setting. This paper contributes new methods for: (1) efficient initialization of the data synopsis in the presence of incoming data, (2) maintenance of the data synopsis under insertions/deletions, and (3) re-optimization of the partitioning to reduce the approximation error. JanusAQP reduces the error of a state-of-the-art baseline by more than 60% using only 10% storage cost. JanusAQP can process more than 100K updates per second in a single node setting and keep the query latency at a millisecond level

arXiv.org e-Print Archive

A Fair and Memory/Time-efficient Hashmap

Author: Asudeh Abolfazl
Shahbazi Nima
Sintos Stavros
Publication venue
Publication date: 21/07/2023
Field of study

There is a large amount of work constructing hashmaps to minimize the number of collisions. However, to the best of our knowledge no known hashing technique guarantees group fairness among different groups of items. We are given a set

P

n

tuples in

\mathbb{R}^d

, for a constant dimension

d

and a set of groups

\mathcal{G}=\{\mathbf{g}_1,\ldots, \mathbf{g}_k\}

such that every tuple belongs to a unique group. We formally define the fair hashing problem introducing the notions of single fairness (

Pr[h(p)=h(x)\mid p\in \mathbf{g}_i, x\in P]

for every

i=1,\ldots, k

), pairwise fairness (

Pr[h(p)=h(q)\mid p,q\in \mathbf{g}_i]

for every

i=1,\ldots, k

), and the well-known collision probability (

Pr[h(p)=h(q)\mid p,q\in P]

). The goal is to construct a hashmap such that the collision probability, the single fairness, and the pairwise fairness are close to

1/m

, where

m

is the number of buckets in the hashmap. We propose two families of algorithms to design fair hashmaps. First, we focus on hashmaps with optimum memory consumption minimizing the unfairness. We model the input tuples as points in

\mathbb{R}^d

and the goal is to find the vector

w

such that the projection of

P

onto

w

creates an ordering that is convenient to split to create a fair hashmap. For each projection we design efficient algorithms that find near optimum partitions of exactly (or at most)

m

buckets. Second, we focus on hashmaps with optimum fairness (

0

-unfairness), minimizing the memory consumption. We make the important observation that the fair hashmap problem is reduced to the necklace splitting problem. By carefully implementing algorithms for solving the necklace splitting problem, we propose faster algorithms constructing hashmaps with

0

-unfairness using

2(m-1)

boundary points when

k=2

and

k(m-1)(4+\log_2 (3mn))

boundary points for

k>2

arXiv.org e-Print Archive

Computing Shortest Paths in the Plane with Removable Obstacles

Author: Agarwal Pankaj K.
Kumar Neeraj
Sintos Stavros
Suri Subhash
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. 16th Scandinavian Symposium and Workshops on Algorithm Theory (SWAT 2018)
Publication date: 01/01/2018
Field of study

We consider the problem of computing a Euclidean shortest path in the presence of removable obstacles in the plane. In particular, we have a collection of pairwise-disjoint polygonal obstacles, each of which may be removed at some cost c_i > 0. Given a cost budget C > 0, and a pair of points s, t, which obstacles should be removed to minimize the path length from s to t in the remaining workspace? We show that this problem is NP-hard even if the obstacles are vertical line segments. Our main result is a fully-polynomial time approximation scheme (FPTAS) for the case of convex polygons. Specifically, we compute an (1 + epsilon)-approximate shortest path in time O({nh}/{epsilon^2} log n log n/epsilon) with removal cost at most (1+epsilon)C, where h is the number of obstacles, n is the total number of obstacle vertices, and epsilon in (0, 1) is a user-specified parameter. Our approximation scheme also solves a shortest path problem for a stochastic model of obstacles, where each obstacle\u27s presence is an independent event with a known probability. Finally, we also present a data structure that can answer s-t path queries in polylogarithmic time, for any pair of points s, t in the plane

Dagstuhl Research Online Publication Server

Computing Data Distribution from Query Selectivities

Author: Agarwal Pankaj K.
Raychaudhury Rahul
Sintos Stavros
Yang Jun
Publication venue
Publication date: 11/01/2024
Field of study

We are given a set

\mathcal{Z}=\{(R_1,s_1),\ldots, (R_n,s_n)\}

, where each

R_i

is a \emph{range} in

\Re^d

, such as rectangle or ball, and

s_i \in [0,1]

denotes its \emph{selectivity}. The goal is to compute a small-size \emph{discrete data distribution}

\mathcal{D}=\{(q_1,w_1),\ldots, (q_m,w_m)\}

, where

q_j\in \Re^d

and

w_j\in [0,1]

for each

1\leq j\leq m

, and

\sum_{1\leq j\leq m}w_j= 1

, such that

\mathcal{D}

is the most \emph{consistent} with

\mathcal{Z}

, i.e.,

\mathrm{err}_p(\mathcal{D},\mathcal{Z})=\frac{1}{n}\sum_{i=1}^n\! \lvert{s_i-\sum_{j=1}^m w_j\cdot 1(q_j\in R_i)}\rvert^p

is minimized. In a database setting,

\mathcal{Z}

corresponds to a workload of range queries over some table, together with their observed selectivities (i.e., fraction of tuples returned), and

\mathcal{D}

can be used as compact model for approximating the data distribution within the table without accessing the underlying contents. In this paper, we obtain both upper and lower bounds for this problem. In particular, we show that the problem of finding the best data distribution from selectivity queries is

\mathsf{NP}

-complete. On the positive side, we describe a Monte Carlo algorithm that constructs, in time

O((n+\delta^{-d})\delta^{-2}\mathop{\mathrm{polylog}})

, a discrete distribution

\tilde{\mathcal{D}}

of size

O(\delta^{-2})

, such that

\mathrm{err}_p(\tilde{\mathcal{D}},\mathcal{Z})\leq \min_{\mathcal{D}}\mathrm{err}_p(\mathcal{D},\mathcal{Z})+\delta

(for

p=1,2,\infty

) where the minimum is taken over all discrete distributions. We also establish conditional lower bounds, which strongly indicate the infeasibility of relative approximations as well as removal of the exponential dependency on the dimension for additive approximations. This suggests that significant improvements to our algorithm are unlikely

arXiv.org e-Print Archive

Efficient Algorithms for k-Regret Minimizing Sets

Author: Agarwal Pankaj K.
Kumar Nirman
Sintos Stavros
Suri Subhash
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. 16th International Symposium on Experimental Algorithms (SEA 2017)
Publication date: 01/01/2017
Field of study

A regret minimizing set Q is a small size representation of a much larger database P so that user queries executed on Q return answers whose scores are not much worse than those on the full dataset. In particular, a k-regret minimizing set has the property that the regret ratio between the score of the top-1 item in Q and the score of the top-k item in P is minimized, where the score of an item is the inner product of the item\u27s attributes with a user\u27s weight (preference) vector. The problem is challenging because we want to find a single representative set Q whose regret ratio is small with respect to all possible user weight vectors. We show that k-regret minimization is NP-Complete for all dimensions d>=3, settling an open problem from Chester et al. [VLDB 2014]. Our main algorithmic contributions are two approximation algorithms, both with provable guarantees, one based on coresets and another based on hitting sets. We perform extensive experimental evaluation of our algorithms, using both real-world and synthetic data, and compare their performance against the solution proposed in [VLDB 14]. The results show that our algorithms are significantly faster and scalable to much larger sets than the greedy algorithm of Chester et al. for comparable quality answers

arXiv.org e-Print Archive

University of Memphis Digital Commons

Dagstuhl Research Online Publication Server

Approximating Distance Measures for the Skyline

Author: Kumar Nirman
Raichel Benjamin
Sintos Stavros
Van Buskirk Gregory
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. 22nd International Conference on Database Theory (ICDT 2019)
Publication date: 01/01/2019
Field of study

In multi-parameter decision making, data is usually modeled as a set of points whose dimension is the number of parameters, and the skyline or Pareto points represent the possible optimal solutions for various optimization problems. The structure and computation of such points have been well studied, particularly in the database community. As the skyline can be quite large in high dimensions, one often seeks a compact summary. In particular, for a given integer parameter k, a subset of k points is desired which best approximates the skyline under some measure. Various measures have been proposed, but they mostly treat the skyline as a discrete object. By viewing the skyline as a continuous geometric hull, we propose a new measure that evaluates the quality of a subset by the Hausdorff distance of its hull to the full hull. We argue that in many ways our measure more naturally captures what it means to approximate the skyline. For our new geometric skyline approximation measure, we provide a plethora of results. Specifically, we provide (1) a near linear time exact algorithm in two dimensions, (2) APX-hardness results for dimensions three and higher, (3) approximation algorithms for related variants of our problem, and (4) a practical and efficient heuristic which uses our geometric insights into the problem, as well as various experimental results to show the efficacy of our approach

University of Memphis Digital Commons

Dagstuhl Research Online Publication Server

Dynamic Enumeration of Similarity Joins

Author: Agarwal Pankaj K.
Hu Xiao
Sintos Stavros
Yang Jun
Publication venue: LIPIcs - Leibniz International Proceedings in Informatics. 48th International Colloquium on Automata, Languages, and Programming (ICALP 2021)
Publication date: 01/01/2021
Field of study

Dagstuhl Research Online Publication Server